18 research outputs found

    Scalable structural index construction for json analytics

    Get PDF
    JavaScript Object Notation ( JSON) and its variants have gained great popularity in recent years. Unfortunately, the performance of their analytics is often dragged down by the expensive JSON parsing. To address this, recent work has shown that building bitwise indices on JSON data, called structural indices, can greatly accelerate querying. Despite its promise, the existing structural index construction does not scale well as records become larger and more complex, due to its (inherently) sequential construction process and the involvement of costly memory copies that grow as the nesting level increases. To address the above issues, this work introduces Pison – a more memory-efficient structural index constructor with supports of intra-record parallelism. First, Pison features a redesign of the bottleneck step in the existing solution. The new design is not only simpler but more memory-efficient. More importantly, Pison is able to build structural indices for a single bulky record in parallel, enabled by a group of customized parallelization techniques. Finally, Pison is also optimized for better data locality, which is especially critical in the scenario of bulky record processing. Our evaluation using real-world JSON datasets shows that Pison achieves 9.8X speedup (on average) over the existing structural index construction solution for bulky records and 4.6X speedup (on average) of end-to-end performance (indexing plus querying) over a state-of-the-art SIMD-based JSON parser on a 16-core machine

    Exploring Scalable Parallelization for Edit Distance-Based Motif Search

    No full text
    Motif Searching is an important problem that can reveal crucial information from biological data. Since the general motif searching is NP-hard and the volume of biological data is growing exponentially in recent years, there is a pressing need for developing time and space-efficient algorithms to find motifs. In this paper, we explore scalable parallelization for Edit Distance-Based Motif Search (EMS). We introduce two parallel designs, recursEMS which integrates the existing EMS solver into a parallel recursion tree running in multiple processes, and parEMS that presents a novel thread-based method which avoids the storage of redundant motif candidates. To make the parallel designs practical, we implement SPEMS, a Scalability-sensitive Parallel solver for EMS. For any given biological dataset and search instance, SPEMS can provide an EMS parallelization towards the optimal performance, or a sub-optimal performance but being more space efficient. Evaluations on two real-world DNA dataset TRANSFAC and ChIP-seq show that SPEMS can obtain 10Ă— geometric mean speedup over the state-of-The-Art at the expense of no less than 74.7% memory overheads, or provide 2.2Ă— geometric mean speedup with the possibility of consuming less memory, when running on a 48-core machine

    GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs

    No full text
    Finite State Machine (FSM) plays a critical role in many real-world applications, ranging from pattern matching to network security. In recent years, significant research efforts have been made to accelerate FSM computations on different parallel platforms, including multicores, GPUs, and DRAM-based accelerators. A popular direction is the speculation-centric parallelization. Despite their abundance and promising results, the benefits of speculation-centric FSM parallelization on GPUs heavily depend on high speculation accuracy and are greatly limited by the inefficient sequential recovery. Inspired by speculative data forwarding used in Thread Level Speculation (TLS), this work addresses the existing bottlenecks by introducing speculative recovery with two heuristics for thread scheduling, which can effectively remove redundant computations and increase the GPU thread utilization. To maximize the performance of running FSMs on GPUs, this work integrates different speculative parallelization schemes into a latency-sensitive framework, GSpecPal, along with a scheme selector which aims to automatically configure the optimal GPU-based parallelization for a given FSM. Evaluation on a set of real-world FSMs with diverse characteristics confirms the effectiveness of GSpecPal. Experimental results show that GSpecPal can obtain 7.2Ă— speedup on average (up to 20Ă—) over the state-of-the-art on an Nvidia GeForce RTX 3090 GPU

    Scalable FSM parallelization via path fusion and higher-order speculation

    No full text
    Finite-state machine (FSM) is a fundamental computation model used by many applications. However, FSM execution is known to be embarrassingly sequential due to the state dependences among transitions. Existing solutions leverage enumerative or speculative parallelization to break the dependences. However, the efficiency of both parallelization schemes highly depends on the properties of the FSM and its inputs. For those exhibiting unfavorable properties, the former suffers from the overhead of maintaining multiple execution paths, while the latter is bottlenecked by the serial reprocessing among the misspeculation cases. Either way, the FSM parallelization scalability is seriously compromised. This work addresses the above scalability challenges with two novel techniques. First, for enumerative parallelization, it proposes path fusion. Inspired by the classic NFA to DFA conversion, it maps a vector of states in the original FSM to a new (fused) state. In this way, path fusion can reduce multiple FSM execution paths into a single path, minimizing the overhead of path maintenance. Second, for speculative parallelization, this work introduces higher-order speculation to avoid the serial reprocessing during validations. This is a generalized speculation model that allows speculated states to be validated speculatively. Finally, this work integrates different schemes of FSM parallelization into a framework-BoostFSM, which automatically selects the best based on the relevant properties of the FSM. Evaluation using real-world FSMs with diverse characteristics shows that BoostFSM can raise the average speedup from 3.1Ă— and 15.4Ă— of the existing speculative and enumerative parallelization schemes, respectively, to 25.8Ă— on a 64-core machine

    Research on the Influence of Nonmorphological Elements’ Cognition on Architectural Design Education in Universities: Third Year Architecture Core Studio in Special Topics “Urban Village Renovation Design”

    No full text
    This study focuses on the topic of “Urban Village Renovation Design” under the complex and diversified social needs in the third year of the architecture undergraduate program at Zhejiang University, China. Based on the theory and method of “situational teaching,” this study proposes a teaching framework integrating the investigation and cognition of nonmorphological elements, such as historical background, economic structure, social structure, public service, and human needs. The study aims to reveal the analysis and response of site investigation and architectural programming to social needs in the realistic context, and take nonmorphological elements as one of the important factors to promote the rationality and authenticity of architectural design, standardize the teaching process in the form of the teaching framework, and realize the teaching goal of solving social needs by design. Qualitative analyses are used to evaluate whether the proposed teaching framework achieves the expected teaching effects according to Bloom’s Taxonomy. We then use the Kirkpatrick model to quantitatively evaluate the specific effects of the framework, and the differences in the positive effects of nonmorphological elements on teaching are explored. In addition, regression analysis is used to discuss ways of obtaining nonmorphological elements. The results show that the teaching framework is a feasible method to improve students’ understanding of social problems and implement reasonable architectural programming that integrates nonmorphological elements in the architectural design course. To some extent, this teaching framework addresses the neglect of nonmorphological elements in traditional Chinese architectural design teaching, and forms an experience-based teaching methodology that can be used to guide architectural design teaching on other topics. This study is helpful in exploring the value and potential of nonmorphological elements in architectural design and provides a reference for college teachers engaged in architectural programming and design teaching
    corecore